(navigation image)
Home Wayback Machine | Archive-It | Blog | Heritrix
Search: Advanced Search
Anonymous User (login or join us)
Upload

Most Downloaded Items Last Week more

  1. Crawldata from from 2007-02-05T00:10:48PDT to 2007-02-05T00:39:43PDT
    137,032 downloads
  2. Crawldata from from 2005-07-16T10:55:11PDT to 2005-07-16T19:59:51PDT
    127,960 downloads
  3. Crawldata from Alexa Internet from 2003-10-26T18:36:36PDT to 2003-10-28T11:21:17PDT
    123,082 downloads
  4. Liveweb Capture 2011-03-27T22:10:09PDT to 2011-03-28T05:27:05PDT
    123,040 downloads
  5. Crawldata from Internet Archive from 2003-03-29T18:35:25PDT to 2003-11-16T08:00:57PDT
    96,400 downloads

Most Downloaded Items more

  1. Webwide Crawldata 2012-01-21T04:01:50PST to 2012-01-20T21:01:34PST
    1,600,752 downloads
  2. Liveweb Capture 2011-03-27T22:10:09PDT to 2011-03-28T05:27:05PDT
    1,233,855 downloads
  3. Crawldata from Internet Archive from 2007-08-01T19:54:22PDT to 2007-08-01T22:21:06PDT
    924,791 downloads
  4. Liveweb Capture 2013-03-29T09:35:58 UTC to 2013-03-29T15:03:34 UTC
    760,440 downloads
  5. Crawldata from Internet Archive from 2007-07-09T02:04:40PDT to 2007-07-09T04:14:42PDT
    742,810 downloads

Spotlight Item

Webwide Crawldata 2012-01-21T04:01:50PST to 2012-01-20T21:01:34PST
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.

About the Internet Archive

Background

Frequently Asked Questions

1,784,016 itemsWelcome to Web Crawls

The Web Archive of the Internet Archive started in late 1996 is made available through the Wayback Machine, and some collections are available in bulk to researchers.

Other than the pages collected by the Internet Archive, major contributors include Alexa Internet, Cuil, and those listed below.

All items (most recently added first) - RSS

Sub-Collections

Accelovation Crawl
Web crawl snapshots generously donated from Accelovation. This data is currently not publicly accessible. From the site: Accelovation is pioneering the delivery of Insight Discovery™ software...
1,324 items
Alexa Crawls
Crawl data donated by Alexa Internet. This data is currently not publicly accessible. Decryption Keys are kept in an item. Alexa is the leading provider of free, global web metrics. Search Alexa to...
92,090 items
Archive Team
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the...
14,102 items
Archive-It Digital Collection
The Archive-It Digital Collection
53,509 items
Away From Keyboard
Away From Keyboard is a memorial collection dedicated to preserving pieces of lives lived online from being scattered and lost. While no collection of data can ever replace a person, these archives...
293 items
collections-aaron-swartz
from Wikipedia: Aaron Hillel Swartz (November 8, 1986 – January 11, 2013) was an American computer programmer, writer, political organizer and Internet activist. Swartz was involved in the...
2 items
Common Crawl
Web crawl data from Common Crawl.
440 items
Cuil Crawl Data
Web crawl snapshot generously donated from cuil.com. This collection of pages mostly from 2007 and some from 2008, is about 310 terabytes of compressed data, and almost 60 billion URLs (mostly text)....
26,494 items
Custom Crawl Services
National library harvesting.
25,208 items
Focused Crawls
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
9,958 items
4 items
httparchive
Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In...
27 items
Institut national de l’audiovisuel
Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia: The Institut national de l'audiovisuel (or INA, French for National...
50 items
Internet Archive Web Crawls
Crawl data collected by the Internet Archive. This data is currently not publicly accessible in this format. To view archived web pages, please visit the Wayback Machine.
377,622 items
Internet Memory Foundation
Data crawled on behalf of Internet Memory Foundation. This data is currently not publicly accessible. from Wikipedia: The Internet Memory Foundation (formerly the European Archive Foundation) is a...
50 items
Mercator Crawl
Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
1 items
Rescue Crawls
Rescue crawls conducted by the public for sites that have announced that they are closing.
2 items
Thumper Transfer
Web crawl data transferred from thumpers in Santa Clara data center.
urlteam Web Crawls
Crawl data collected by the urlteam. The URLTeam is the ArchiveTeam subcommittee on URL shorteners. We believe that they pose a serious threat to the internet's integrity. If one of them dies, gets...
4 items
Web Collections
Web Collections organized by year. Some of this data is currently not publicly accessible.
19 items
web-group-internal
miscellaneous data
28,760 items
Wiki Collections
Collections of Wiki data
48,087 items
Wikileaks.org Archive
A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs,...
3 items

Recently Reviewed Items (more)

ArchiveTeam JSON Download of Twitter Stream: 2012-12
Average rating:4.00 out of 5 stars4.00 out of 5 stars4.00 out of 5 stars4.00 out of 5 stars

AIT-1216 Crawldata 2009-03-30T18:30:53PDT to 2008-11-08T20:37:09PST
Average rating:5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars

IMSLP Petrucci Music Library Data Dump - 20121202
Average rating:5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars

Communication issues musings of a dinosaur
Average rating:5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars5.00 out of 5 stars

www.theregister.co.uk/2010 201210 panic download
Average rating:

This Just In (more)

YouTube Video Crawldata 2013-06-18T21:18:47PDT to 2013-06-18T14:43:14PDT
26 minutes ago

Webwide Crawldata 2013-06-16T21:44:27PDT to 2013-06-16T15:52:49PDT
27 minutes ago

NLS_2013 Crawldata 2013-06-17T04:16:17PDT to 2013-06-17T06:56:06PDT
29 minutes ago

alexa20130618-23
30 minutes ago

NLS_2013 Crawldata 2013-06-13T04:04:03PDT to 2013-06-12T21:35:55PDT
32 minutes ago


 

New PostWayback Machine Forum Subscribe to or unsubscribe from this forum RSS feed of most recent posts to this forum

Subject Poster Replies Date
I also want to add trportal 0 June 17, 2013 04:33:30am
Archieve of my post about komedo hadingrh 0 June 15, 2013 08:04:08am
The whole Stylus Magazine website is down! angeldeb82 0 June 13, 2013 04:19:00pm
Wayback Machine missing crawls John21Allen 0 June 12, 2013 06:59:33am
XFM New Rock 22 2000 cops 0 June 11, 2013 02:54:29pm
Hi please ad my site www.sunika.co.za sunika 1 June 11, 2013 12:34:14am
   Re: Hi please ad my site www.sunika.co.za Nutri4verve 1 June 11, 2013 04:37:28am
     Re: Hi please ad my site www.sunika.co.za trportal 0 June 17, 2013 04:35:12am
Kindly crawl my website www.partsandspares.co.za smutyora 0 June 11, 2013 12:12:58am
I can't go to any archived pages! angeldeb82 1 June 10, 2013 03:16:09pm
   Re: I can't go to any archived pages! angeldeb82 0 June 11, 2013 12:19:11pm
I sure was funny back then professorguy 0 June 09, 2013 12:28:22pm
Search the archive for a specific date 837183 0 June 09, 2013 08:56:20am
I can't live web webarchiver354 0 June 08, 2013 12:00:52pm
Need help retrieving content sumar 0 June 07, 2013 08:37:08am
Please add site Samsung Wallpapers http://www.samsung-wallpapers.com jeke 1 June 07, 2013 07:29:07am
   Re: Please add site Samsung Wallpapers http://www.samsung-wallpapers.com jeke 1 June 07, 2013 07:33:01am
     Re: Please add site Samsung Wallpapers http://www.samsung-wallpapers.com smutyora 0 June 11, 2013 12:15:25am
Please add My Oxford English Curso de ingles 0 June 06, 2013 09:20:07am
NYC Step Out 2008 site elline4realNYs 0 June 06, 2013 01:14:23am
Can you crawl my page please? http://solutioncentre.org.uk/ gonzalog2 0 June 05, 2013 01:53:34pm
Please add www.tophold.com tophold 0 June 05, 2013 02:29:37am
Please add my site http://bc4u.ru/ kosten_lks 1 June 02, 2013 09:40:42am
   Re: Please add my site http://bc4u.ru/ webarchiver354 2 June 03, 2013 09:06:34pm
Remove Captured Page from Archive icohen 0 May 30, 2013 11:11:14am
wayback machine has stopped archiving my site zakamin 0 May 30, 2013 04:39:45am
How to remove a single web page posclegom 0 May 29, 2013 10:46:40am
How to remove a single web page posclegom 0 May 29, 2013 10:46:40am
Can i search for specifics? xatsearch 0 May 26, 2013 02:30:40am
Technobox bibinxavier 0 May 26, 2013 12:32:00am
Dead Link on Archive.Org Page Disabled Community dot Org 1 May 24, 2013 07:50:24am
   Re: Dead Link on Archive.Org Page Jeff Kaplan 0 May 24, 2013 12:42:24pm

View more forum posts
 

Terms of Use (10 Mar 2001)